Reproducing Montaño-Centellas et al., 2020

Semester Project, BIOL 7800 Reproducible Research

Karthik Thrikkadeeri

Montaño Lab, Louisiana State University

April 29, 2025

Original study

  • Functional & phylogenetic vs taxonomic
  • General global pattern of diversity with elevation

  • Bird community data
  • Resolution: 100 m elevational bands
Figure 1: Locations of the 46 elevational gradients used in the study.

  1. Four diversity metrics
  2. Quadratic regression with elevation
  3. Classify curve shapes

  1. Four diversity metrics
  2. Quadratic regression with elevation
  3. Classify curve shapes
  4. Consistency across mountains & metrics

1. Elevational patterns of diversity

Figure 2: Patterns of bird functional and phylogenetic diversity along elevational gradients.

2. Deterministic strength

2. Deterministic strength

Figure 3: Global trends of standardised effect sizes of diversity metrics against standardised elevation, as predicted from quadratic regressions and a global quadratic mixed-effects model.

3. Latitudinal effects

3. Latitudinal effects

Figure 4: Effect of latitude on deterministic strength of elevational trends in diversity.

My process

Starting point

No code available.

No raw data available.

Processed dataset archived on Dryad.

read_xlsx("data/montano_dryad_data.xlsx", # single Excel file
          sheet = "FDPD_Supporting") %>% # sheet name
  slice_sample(n = 6)
# A tibble: 6 × 15
  Mountain  elevation Richness  FRic SES.FRic p.SESFRic  FDis SES.FDis p.SESFDis
  <chr>         <dbl>    <dbl> <dbl>    <dbl>     <dbl> <dbl>    <dbl>     <dbl>
1 Undzungwa      1800       40  7.91   -0.703     0.384  1.33   -0.952     0.196
2 Mulu              0      134 20.8     0.190     0.464  1.47   -0.272     0.38 
3 Kilimanj…      3200       16  2.34   -0.890     0.148  1.42   -0.315     0.426
4 Undzungwa       800       30 18.6     0.959     0.694  1.56    0.621     0.68 
5 Totare          200       87 25.5     0.399     0.65   1.66    1.60      0.94 
6 Yosemite        100       57 26.9     1.01      0.718  1.60    1.34      0.908
# ℹ 6 more variables: PD <dbl>, SES.PD <dbl>, p.SESPD <dbl>, MPD <dbl>,
#   SES.MPD <dbl>, p.SESMPD <dbl>

Starting point

No latitude information.

Cannot test latitudinal effects (Figure 4).

Can reproduce first two analyses (Figure 2, Figure 3).

Reproduction

1. Elevational patterns

Fitting models

# iterate over each mountain in the dataset:
# fit LM of each metric against linear and second-order polynomial terms of Elevation
# extract intercept and coefficients for both Elevation terms
# aggregate all these values over all mountains

list_mount <- unique(data_dryad$Mountain)
metric_cols <- c("FRic", "FDis", "PD", "MPD")
to_iter <- expand.grid(mountain = list_mount, metric = metric_cols)

quad_reg <- purrr::map2(to_iter$mountain, to_iter$metric, ~ {
  
  data_mount <- data_dryad %>% filter(Mountain == .x)
  model_formula <- as.formula(glue("{.y} ~ elevation + I(elevation^2)"))
  model_mount <- lm(model_formula, data_mount)
  
  # predict only for elevations present in each mountain
  to_pred <- data_mount %>% distinct(Mountain, elevation)
  # predict
  model_pred <- predict(model_mount, newdata = to_pred, type = "response")
  pred <- bind_cols(to_pred, pred = model_pred)
  
  # store p-value of model fit
  p <- broom::glance(model_mount) %>% pull(p.value)

  # bind all outputs together
  t(model_mount$coefficients) %>% 
    as_tibble() %>% 
    magrittr::set_colnames(c("intercept", "coef_linear", "coef_poly")) %>% 
    bind_cols(tibble(mountain = .x, metric = .y, pval = p)) %>% 
    left_join(pred, by = c("mountain" = "Mountain"))
  
}) %>% 
  list_rbind() %>% 
  relocate(metric, mountain) # bring metric and mountain cols to the start

Plotting predicted patterns

Code
quad_reg %>% 
  ggplot(aes(x = elevation, y = pred, col = mountain)) +
  geom_line() +
  facet_wrap(~ metric, ncol = 1, scales = "free_y", strip.position = "left") + 
  labs(x = "Elevation", y = "") +
  # scale_col_pal + # only makes sense to colour scheme after classifying into 8 categs
  theme(panel.grid.major.x = element_line(colour = "grey", linetype = "dotted"),
        # match strip format in paper
        strip.background = element_rect(colour = NA),
        strip.placement = "outside",
        strip.text = element_text(size = 10),
        # remove colour key
        legend.position = "none") 

Classifying elevational patterns

# classify elevational trends
quad_reg_class <- quad_reg %>% 
  get_inflection() %>% 
  get_elev_cats() %>% # low, mid, high
  class_valley_peak() %>% 
  group_by(metric, mountain) %>% 
  reframe(trend = case_when(
    min(pval) > 0.1 ~ "NS",
    
    all(inflection == FALSE) & all(pred_diff >= 0) ~ "Increasing",
    all(inflection == FALSE) & all(pred_diff <= 0) ~ "Decreasing",
    
    all(valleypeak == "valley") & all(!(inflection == TRUE & elev_cat != "LOW")) ~ "Low Valley",
    all(valleypeak == "valley") & all(!(inflection == TRUE & elev_cat != "HIGH")) ~ "High Valley",
    all(valleypeak == "valley") & all(!(inflection == TRUE & elev_cat != "MID")) ~ "Mid Valley",
    
    all(valleypeak == "peak") & all(!(inflection == TRUE & elev_cat != "LOW")) ~ "Low Peak",
    all(valleypeak == "peak") & all(!(inflection == TRUE & elev_cat != "HIGH")) ~ "High Peak",
    all(valleypeak == "peak") & all(!(inflection == TRUE & elev_cat != "MID")) ~ "Mid Peak",
    
    TRUE ~ NA
  )) 

Code
# plot frequency bar graph of different trend types
quad_reg_class %>% 
  mutate(trend = factor(trend, 
                         levels = c("Increasing", "Decreasing", "Mid Peak", "Mid Valley",
                                    "Low Peak", "Low Valley", "High Peak", "High Valley", "NS"))) %>% 
  group_by(metric) %>% 
  count(trend) %>% 
  ggplot() +
  geom_col(aes(x = trend, y = n, fill = trend)) +
  facet_wrap(~ metric, ncol = 1, strip.position = "left") +
  labs(x = "Trend type", y = "") +
  scale_fill_pal +
  theme(legend.position = "none",
        # match strip format in paper
        strip.background = element_rect(colour = NA),
        strip.placement = "outside",
        strip.text = element_text(size = 10))

2. Deterministic strength

Fitting models

Individual mountain models

Code
# iterate over each mountain in the dataset:
# fit LM of each metric against linear and second-order polynomial terms of scaled elevation
# extract model predictions for each mountain

metric_sess <- glue("SES.{metric_cols}")
to_iter <- expand.grid(mountain = list_mount, metric = metric_sess)

quad_reg_ses <- map2(to_iter$mountain, to_iter$metric, ~ {
  
  data_mount <- data_dryad %>% filter(Mountain == .x)
  model_formula <- as.formula(glue("{.y} ~ elevation_scaled + I(elevation_scaled^2)"))
  model_mount <- lm(model_formula, data_mount)
  
  # predict only for elevations present in each mountain
  to_pred <- data_mount %>% distinct(Mountain, elevation_scaled)
  # predict
  model_pred <- predict(model_mount, newdata = to_pred, type = "response")
  pred <- bind_cols(to_pred, pred = model_pred)
  
  tibble(mountain = .x, metric = .y) %>% 
    left_join(pred, by = c("mountain" = "Mountain"))
  
}) %>% 
  list_rbind() %>% 
  relocate(metric, mountain) %>% # bring metric and mountain cols to the start
  mutate(metric = gsub("\\.", " ", metric)) %>% # remove period from metric name
  mutate(metric = factor(metric,
                         levels = c("SES FRic", "SES FDis", "SES PD", "SES MPD")))

# summarise original data to plot raw data points

data_dryad_ses <- data_dryad %>% 
  select(Mountain, elevation_scaled, starts_with("SES")) %>% 
  pivot_longer(cols = starts_with("SES"), 
               names_to = "metric", values_to = "obs") %>% 
  rename(mountain = Mountain) %>% 
  mutate(metric = gsub("\\.", " ", metric)) %>% # remove period from metric name
  mutate(metric = factor(metric,
                         levels = c("SES FRic", "SES FDis", "SES PD", "SES MPD")))

Global mixed-effects model

Code
quad_lmm_ses_output <- map(metric_sess, ~ {
  
  model_formula <- as.formula(glue("{.x} ~ elevation_scaled + I(elevation_scaled^2) + (1 | Mountain)"))
  model_ses <- lmer(model_formula, data_dryad, REML = FALSE) # because we use Likelihood Ratio Test
  
  # predict only for elevations present in each mountain
  to_pred <- data_dryad %>% distinct(elevation_scaled)
  # predict
  model_pred <- predict(model_ses, newdata = to_pred, type = "response", 
                        re.form = NA)
  pred <- bind_cols(to_pred, pred = model_pred) %>% mutate(metric = .x)
  
  # also generate model summary for Table 1
  summ <- gen_mem_summ(model_ses) %>% mutate(metric = .x)
  
  # return both
  list(predicted = pred,
       summary = summ)
  
})

# model predictions
quad_lmm_ses <- map(quad_lmm_ses_output, "predicted") %>% 
  list_rbind() %>% 
  relocate(metric) %>%  # bring metric col to the start
  mutate(metric = gsub("\\.", " ", metric)) %>% # remove period from metric name
  mutate(metric = factor(metric,
                         levels = c("SES FRic", "SES FDis", "SES PD", "SES MPD")))

# model summary
quad_lmm_ses_summ <- map(quad_lmm_ses_output, "summary") %>%
  list_rbind() %>% 
  relocate(metric) %>% 
  # refining formatting of labels and values
  mutate(metric = gsub("\\.", " ", metric), # remove period from metric name
         term = str_replace(term, "elevation_scaled", "Elevation")) %>% 
  mutate(term = if_else(str_detect(term, "\\("),
                        str_match(term, "\\(([^()]*)\\)")[, 2],
                        term) %>% 
         str_replace_all("\\^2", "<sup>2</sup>")) %>% 
  mutate(term = str_replace(term, "Mountain", "Mountain (random effect)")) %>% 
  pivot_wider(names_from = "metric", values_from = "value") %>% 
  rename(Terms = term)

Lots of models!

# (elev. patt. for each mountain) + ((det. strength for each mountain) + (global det. strength))
how_many_models <- (dim(to_iter)[1]) + (dim(to_iter)[1] + length(metric_sess))
how_many_models
[1] 372

fig3_panel <- function(which_metric) {
  
  (
    # base plot
    ggplot() +
      # raw data points
      geom_point(data = data_dryad_ses %>% filter(metric == which_metric), 
                 col = "grey", alpha = 0.5,
                 mapping = aes(x = elevation_scaled, y = obs, group = mountain)) +
      # individual mountain model predictions (blue line)
      geom_line(data = quad_reg_ses %>% filter(metric == which_metric), 
                col = "#00538F",
                mapping = aes(x = elevation_scaled, y = pred, group = mountain)) +
      geom_hline(yintercept = 0) +
      # overall LMM prediction across all mountains (black line)
      geom_line(data = quad_lmm_ses %>% filter(metric == which_metric), 
                linewidth = 1.25,
                mapping = aes(x = elevation_scaled, y = pred)) +
      labs(x = "Elevation (scaled)", y = which_metric) +
      # match strip format in paper
      theme(axis.title.y = element_text(size = 10))
  ) %>% 
    # include marginal histograms
    ggExtra::ggMarginal(type = "histogram", fill = "grey")
  
}

metric_sess_form <- gsub("\\.", " ", metric_sess)

(wrap_elements(fig3_panel(metric_sess_form[1])) | 
  wrap_elements(fig3_panel(metric_sess_form[2]))) /
  (wrap_elements(fig3_panel(metric_sess_form[3])) | 
     wrap_elements(fig3_panel(metric_sess_form[4]))) &
  plot_annotation(tag_levels = "a", tag_prefix = "(", tag_suffix = ")") &
  theme(plot.tag.position = c(0.05, 0.95)) # align tags with axis labels

Plotting everything

knitr::kable(quad_lmm_ses_summ)
Table 1: Parameter estimates (with SE) of mixed-effects models. The last row shows variances of the random effect term, Mountain.
Terms SES FRic SES FDis SES PD SES MPD
Intercept -0.007 ± 0.111 -0.439 ± 0.118 -3.212 ± 0.222 -2.023 ± 0.268
Elevation -0.035 ± 0.028 0.004 ± 0.03 0.285 ± 0.036** -0.177 ± 0.034**
Elevation2 -0.067 ± 0.032* 0.2 ± 0.034** 0.496 ± 0.041** 0.367 ± 0.039**
Mountain (random effect) 0.487 0.541 2.135 3.192

Conclusion

✓ Elevational patterns of diversity

✓ Deterministic strength

✗ Latitudinal effects

Hurdles & learnings

  • Reproduction with gaps
  • Lack of information on classification of trends
  • Manual vs programmatic discrepancy
  • “normalized” vs “scaled” vs “scaled and centred”
  • Broader issues
    • Journal restrictions (even supp.)
    • Decoupling of code & outputs from text
    • Pressure to release “final product”
  • Think of the reproducibility spectrum!

What went smooth?

  • Having no (bad) code!
  • 372 individual models but purrred worries away!

Favourite new tool

plotly_test <- (quad_reg_class %>% 
  mutate(trend = factor(
    trend, 
    levels = c("Increasing", "Decreasing", "Mid Peak", "Mid Valley",
               "Low Peak", "Low Valley", "High Peak", "High Valley", "NS")
  )) %>% 
  group_by(metric) %>% 
  count(trend) %>% 
  ggplot() +
  geom_col(aes(x = trend, y = n, fill = trend)) +
  facet_wrap(~ metric, ncol = 1, strip.position = "left") +
  labs(x = "Trend type", y = "") +
  scale_fill_pal +
  theme(legend.position = "none",
        strip.background = element_rect(colour = NA),
        strip.placement = "outside",
        strip.text = element_text(size = 10))) %>% 
  ggplotly()

Thank you!